Skip to content

Remove content app redirect for manifest fetching#2358

Merged
gerrod3 merged 1 commit into
pulp:mainfrom
gerrod3:issue-1974
May 21, 2026
Merged

Remove content app redirect for manifest fetching#2358
gerrod3 merged 1 commit into
pulp:mainfrom
gerrod3:issue-1974

Conversation

@gerrod3
Copy link
Copy Markdown
Contributor

@gerrod3 gerrod3 commented May 6, 2026

Fixes #1974
https://redhat.atlassian.net/browse/PULP-486

To properly remove the redirect for manifest fetching I had to remove the PullThroughCaching redirect logic as well. Now the entirety of PullThroughCaching lives on the registry_api side, no more content app logic needed. These are the three scenarios we were handling for pull-through caching:

  1. Manifest tag fetch:
    Upon first fetch you need to create the tag and download the manifest and initiate a task to download the rest of the image in the background (tag=None, local_manifest=None). If the upstream manifest already exists on Pulp you simply need to create the tag and add it and the image's content to the repo (tag=None, local_manifest is not None). On the second fetch of a tag with pull-through you need to check to see if the tag has been updated on the remote. There are three scenarios for this: one - new upstream manifest that needs to be downloaded to Pulp (tag is not None, local_manifest=None), two - new upstream manifest already in Pulp needs to be added to repo (tag is not None, local_manifest is not None), and three - no changes on upstream (tag is not None, tag.manifest.digest == local_manifest.digest). If the tag doesn't exist on the remote, but we have the tag locally then we should serve our local copy (tag is not None, fetch_manifest raised ApiException).
  2. Manifest digest fetch:
    When fetching a manifest by digest we either have it or we don't. If we don't, we check the upstream if it's there. If yes, then we check if the manifest is in Pulp locally and just needs to be added to the repo, or if we need to download the whole image in the background.
  3. Blob fetch:
    When fetching a blob and we don't have it in the repository we check to see if the remote has the blob. If so, we create the blob as an on-demand content and redirect to the content-app for it to finish streaming the blob from the upstream. We also add it to the pending_blobs to speed up future requests.

📜 Checklist

  • Commits are cleanly separated with meaningful messages (simple features and bug fixes should be squashed to one commit)
  • A changelog entry or entries has been added for any significant changes
  • Follows the Pulp policy on AI Usage
  • (For new features) - User documentation and test coverage has been added

See: Pull Request Walkthrough

@gerrod3 gerrod3 force-pushed the issue-1974 branch 4 times, most recently from 848ee41 to 5ad30c0 Compare May 8, 2026 15:45
@dralley dralley marked this pull request as draft May 12, 2026 01:04
@gerrod3 gerrod3 force-pushed the issue-1974 branch 4 times, most recently from 20d478f to a3b1c6e Compare May 13, 2026 22:14
@gerrod3 gerrod3 marked this pull request as ready for review May 14, 2026 19:14
blob.touch()
except models.Blob.DoesNotExist:
raise BlobNotFound(digest=pk)
if pk == EMPTY_BLOB:
Copy link
Copy Markdown
Contributor

@dralley dralley May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain (and maybe document) this a bit better? What is EMPTY_BLOB and why does it redirect?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a layer that has nothing in it. Each layer(blob) is a gzip-compressed tar and the empty blob is a no-op layer used in images, so we handle it specially and have the empty compressed bytes ready to serve if requested. See the _empty_blob() method in registry.py, there are still bytes that need to be served so that is why it was originally redirected to the content app.

Copy link
Copy Markdown
Contributor

@dralley dralley May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name pk is a bit confusing because the EMPTY_BLOB constant seems to be a checksum string

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want me to rename pk? It's just a path parameter of the url and it's equal to the blob's digest field which is the unique field of the blob, but not its pulp_id (db pk).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed necessarily, but it would be good to distinguish from pk in the database sense. Not blocking

blob.touch()
return redirects.issue_blob_redirect(blob)

def create_on_demand_blob(self, digest, remote, blob_url):
Copy link
Copy Markdown
Contributor

@dralley dralley May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing doc comments - would be nice if all functions added here had them

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, even in the pull-through cache, you are using on-demand?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, because we redirect to the content app we can use the content app's ability to stream the artifact from the remote while saving it to Pulp. So that is why we create the blob as on-demand before redirecting to the content app. Technically the content app also has the ability to create the content for pull-through at request time, but that logic is complicated and we already overwrite the content app for pulp-container and have special operations for pending_blobs, so I didn't think it was worth it to try to integrate into pulpcore's pull-through machinery.

Comment thread pulp_container/app/registry_api.py
Comment thread pulp_container/app/registry_api.py
fixes: pulp#1974
Assisted by: claude-sonnet-4.6
@gerrod3 gerrod3 merged commit 8479d52 into pulp:main May 21, 2026
13 of 14 checks passed
@gerrod3 gerrod3 deleted the issue-1974 branch May 21, 2026 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Omit redirects to content-app

2 participants